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In  this  paper,  we  discuss  equilibrium  and  perfect  equilibrium  in  a 
simplified  model  of  the  supergame.   We  assume  that  players  can  observe 
the  mixed  moves  employed  by  all  players  at  each  previous  stage.   For 
this  model,  we  obtain  a  complete  characterization  of  the  set  of  equilibrium 
outcomes,  and  a  fairly  weak  sufficient  condition  for  this  set  to  coincide 
with  the  set  of  perfect  equilibrium  outcomes.   Inter  alia,  simple  proofs 
of  the  Folk  Theorem  and  the  result  that  the  requirement  of  perfection 
does  not  eliminate  any  equilibrium  outcomes  for  the  undiscounted  supergame 
are  presented. 


Equilibrium  and  Perfection  in  Discounted  Supergames,  I:   Public  Lotteries 

I.    Introduction 

This  paper  deals  with  some  results  on  the  characterisation  of  pay- 
offs sustainable  by  equilibria  or  perfect  equilibria  of  infinitely- 
repeated  games  with  discounting.  This  work  extends  previous  work  on 
supergames  without  discounting  by  Aumann  (1976) ,  Aumann  and  Shapley,  and 
Rubenstein  (1977) .   In  this  paper,  we  work  with  a  simplified  model  of  the 
supergame  used  by  Roth  and  Rubenstein  (1977),  in  which  players  can  observe 
the  behavioral  strategies  used  by  their  opponents  at  the  conclusion  of  each 
play.   In  a  subsequent  paper,  we  show  that  passage  to  the  more  general 
model  in  which  only  the  realisations  of  these  strategies  can  be  observed 
does  not  materially  affect  the  results. 

In  the  undiscounted  case,  where  players  evaluate  the  infinite  streams 
of  payoffs  accruing  to  them  by  the  limit  of  means,  should  it  exist,  it 
has  been  demonstrated  that  any  outcome  that  is  feasible  and  individually 
rational  in  the  stage  game  can  be  sustained  by  a  Nash  Equilibrium  of  the 
supergame.   In  this  context,  an  outcome  is  feasible  if  it  belongs  to  the 
convex  hull  of  the  pure-strategy  payoffs  and  is  individually  rational 
if  it  yields  each  player  a  payoff  at  least  as  great  as  his  minmax  payoff 
in  the  game.   Moreover,  it  has  been  demonstrated  that  the  additional 
requirement  of  perfection  does  not  affect  the  set  of  outcomes. 

In  the  supergame  with  discounting,  we  shall  find  that  neither  of 
these  results  goes  through.   In  the  first  place,  not  all  outcomes  in  the 
convex  hull  of  the  pure-strategy  payoffs  are  feasible.   Of  those  out- 
comes which  are  feasible,  not  all  the  individually  rational  ones  can 
be  obtained  as  equilibrium  outcomes  for  the  discounted  supergame,  since 
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myopic  players  will  not  be  deterred  by  the  promise  of  eventual  punish- 
ment.  Finally,  not  all  equilibrium  outcomes  can  be  supported  by  perfect 
equilibrium  outcomes,  although  there  are  several  sufficient  conditions 
that  include  many  games  of  theoretical  and  economic  interest. 

The  organisation  of  the  paper  is  as  follows:   in  the  first  section 
we  describe  the  model  of  the  supergame  we  are  using.   In  the  second 
section,  we  describe  the  set  of  attainable  outcomes  in  the  discounted 
supergame.  The  third  section  characterises  the  set  of  equilibrium  out- 
comes, while  the  fourth  section  describes  certain  sufficient  conditions 
for  this  set  to  co-incide  with  the  set  of  perfect  equilibrium  outcomes, 
and  provides  a  simple  economic  example  of  the  possibility  that  these 
sets  may  differ. 

II.   The  model 

The  verbal  description  of  the  model  is  as  follows :   we  begin  with 
a  stage  game  in  normal  (strategic)  form,  with  a  finite  number  of  players, 
each  of  whom  selects  one  of  a  finite  number  of  pure  strategies.   This 
game  is  to  be  played  a  denumerable  infinity  of  times,  and  at  each  play, 
the  choices  of  the  players  are  allowed  to  depend  on  the  entire  previous 
history  of  the  game.   In  particular,  this  means  that  each  player  is 
allowed  to  observe  the  mixed  strategy  in  the  stage  game  used  by  each 
of  opponents  at  each  previous  stage.   This  is  a  strong  assumption, 
and  requires  some  justification.   One  justification  is  that  one  might 
think  of  this  as  a  situation  in  which  only  pure  strategies  are  allowed, 
and  where  the  stage  game  has  a  continuous  payoff  function  defined 
over  convex,  compact  and  finite-dimensional  pure  strategy  sets 
for  each  player.   Another  interpretation  is  that  the  players  meet  on 
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successive  days,  but  that  play  during  each  day  consists  of  a  sufficiently 
large  number  of  repetitions  of  the  specified  mixed  strategies  for  that 
day  that  each  player  can  observe  the  mixtures  used  by  his  opponents  with 
probability  arbitrarily  close  to  unity.   To  this,  we  add  the  further 
condition  that  discounting  is  done  on  a  daily,  rather  than  a  continuous 
basis,  and  that  players'  strategies  are  fixed  during  the  course  of  each 
day's  play.   A  stronger  justification  will  be  provided  in  the  sequel, 
where  we  demonstrate  that  the  only  effect  of  relaxing  the  assumption  is 
to  shrink  the  set  of  attainable  outcomes,  and  that  the  restrictions  of 
the  sets  of  equilibrium  and  perfect  equilibrium  outcomes  found  in  this 
paper  to  the  new  set  of  attainable  outcomes  form  the  new  sets  of  equi- 
librium and  perfect  equilibrium  outcomes.   The  result  of  an  n-tuple  of 
supergame  strategies  is  an  infinite  sequence  of  n-tuples  of  mixed  strat- 
egies in  the  stage  game;  to  this,  we  associate  a  corresponding  infinite 
random  sequence  of  payoffs.   There  are  various  ways  for  the  players  to 
evaluate  these  sequences,  but  we  shall  concentrate  on  the  discounted 
sum,  normalised  to  lie  within  the  convex  hull  of  the  payoffs  in  the 
stage  game. 

2.1  Definition:   The  stage  game  is  a  triple  [N,S,h],  where  N  is  a  finite 

set  of  players;  S  =  x  S .  is  the  set  of  n-tuples  of  pure  strategies 

i«N  X 
(also  finite),  and  h:   S  ■+■  R  is  the  payoff  function.   We  also  define 

the  mixed  extension  of  the  stage  game  to  be  the  triple  [N,M,H] ,  where 

N  is  as  above;  M  =   *  M.  is  the  set  of  mixed  strategies,  with  generic 

iSN  X 
member 

m  =  (m. , . . . ,m  )  where,  for  each  i 


-4- 


m.  £  A | S .  J  is  a  probability  distribution  on  the  members  of  S . , 

Thus,  m  is  a  probability  distribution  on  the  members  of  S,  although  not 
all  such  probability  distributions  (called  correlated  strategies)  can  be 
represented  as  members  of  M.   If  m(s)  is  the  probability  that  the  n-tuple 
of  pure  strategies  s,  will  be  played: 


m(s.  , . . . ,s  )  =  IT  m.  (s  .) 

1      n    ._,,  1  l 


we  define  the  expected  payoff  H(m)  by 


H(m)  =  Z   m(s)h(s) 
s 

It  is  clearly  a  continuous  function,  linear  in  each  of  the  numbers  m.(s.) 

2.2  Definition:  Let  [N,S,h]  =  G  be  a  stage  game;  G*  =  [N,F,P]  is  a 

supergame  of  G  if  the  following  are  satisfied:   F  =   x   F.  is  the  space 

i^N   1 
of  pure  supergame  strategies.   For  each  member  f.  S  F.,  we  write 

f.  =  (f.,...,f.,...),  where 
i     l      l 

f1^  S. 
l     l 


fZ:      [S]t_1  -»  S, 


and  P  =  [P, ,...,P  ]  where  each  P.  is  a  partial  ordering  on  the  space  R 
In  i 

of  infinite  sequences  of  real  numbers.   P.  represents  the  preferences  of 
player  i  over  the  infinite  streams  of  payoffs  resulting  from  plays  of 
the  supergame. 
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2.3  Definition;  Let  G*  be  a  supergame,  and  f  an  n-tuple  of  pure 
strategies  for  the  supergame.   We  can  calculate  a  sequence 

s(f)  =  (s  (f),...,s  (f),...)  of  outcomes  in  the  stage  game  as  follows: 

s1(f)  =  rX(f)  =  (f1 f1)  =  f1 

1      n 

sC(f)  =  (f1t(rt-1(f)),...,ft(rt"1(f)))  =  fV1^)) 
i  n 

rt(f)  =  (r^C^.s'Cf)) 

Thus,  s  (f)  is  the  action  specified  by  f  for  the  t —  play  of  the  game 
(if  all  previous  plays  have  been  according  to  f) ,  and  r  (f)  is  the  cumu- 
lative record  of  play  up  to  and  including  the  play  on  date  t.   To  the 
sequence  s(f)  we  can  associate  a  sequence  of  payoffs 
g(f)  =  (g, (f) ,...,g^(f ),..., gi(f) ,...)  in  the  obvious  way,  using  the 
pure  strategy  payoff  function  h  from  the  stage  game: 

gj(f)  =  hi(st(f)) 

and  for  convenience,  we  shall  write  g(f)  -  (g  (f ),..., g  (f ),...)  and 
g(f)  =  (g;L(f),...,gn(f)). 

2.4  Definition:   A  discounted  supergame  is  a  supergame  where  each  player 

i  is  characterised  by  a  discount  rate  6.  £  [0,1],  and  has  the  preference 

i      t  ]      t  °° 

relation  P.  defined  by  (x  ,...,x  ,...)  P.  (y  ,...,y  ,...)  (where  x,y  e  R  ) 

iff 

oo 

hj(x)    =   (1  -   6.)    E   6t_1xt  1  (1  -  S±)    Z  &t±~1yt  =  h*(y) 
t=i   1  t=l 

By  an  obvious   abuse  of   notation,    we  can  define  a   payoff    function 

h    :      F  ->R     for    the  discounted  supergame: 
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hj(f)  =iu(gi(f)) 

One  of  the  nicest  features  of  the  discounted  supergame  is  that  with  this 
payoff  function,  the  set  of  payoffs  h  (F)  is  a  compact  subset  of  CH(h(S)) 
and  also  that  h  (f)  is  continuous  in  f.   There  are  other  preference  rela- 
tions that  have  been  used,  including  the  first  Caesaro  mean  of  the  pay- 
offs and  the  overtaking  relation.  However,  the  first  of  these  will  not 
give  an  answer  for  payoff  sequences  that  are  not  Caesaro  summable,  while 
the  second  cannot  be  represented  by  a  payoff  function.   The  next  problem 
is  that  of  mixed  strategies.   Since  these  supergames  are  games  of  perfect 
recall,  it  follows  from  Aumann's  (1964)  extension  of  Kuhn's  (1953)  theorem 
that  it  is  sufficient  to  confine  our  attention  to  behavioral  strategies; 
in  this  case,  a  behavioral  strategy  is  a  device  which  selects  a  mixed 
strategy  in  each  stage  game. 


2.5  Definition:   Let  G*  be  a  supergame,  and  define  F  =   x  F.  to  be  the 

i€N  1 
space  of  m-tuples  of  behavioral  strategies  for  the  supergame.   The  generic 

member  f.  €  F.  is  defined  by: 

I1  e  m. 
i   i 

IS   [M]t_1  +  M. 
i    L         l 

For  any  n-tuple  f  of  behavioral  strategies,  we  can  calculate  a  sequence 
m(f)  of  mixed  stage  game  outcomes  as  follows: 

m1^)  =  (?*.., f1)  =  I1  =  rL(f) 
1      n 

mM)  =  (f!:(rt"1(f)),...,ft(rt"1(f)))  -f^r^Cf)) 
1  n 
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rC(f)  =  [rt"1(f),mt(f)] 


Since  the  choices  are  independent  at  each  stage,  the  expected  payoff 
sequence  is  well-defined  by  G(f)  =  [G.(f):  i  £  N,  t  =  1, . . . ] ,  where 
G.(f)  =  H.(m  (f)).   The  discounted-supergame  "payoff  function"  resulting 

00 

from  this  definition  is  H6 :  ?  •*  Rn,  where  H6  (7)  =  I  6t"1Gt(T). 

1      t=l 
It  only  remains  to  define  equilibrium  and  perfect  equilibrium  for 

the  discounted  supergame. 

2.6  Definition;   Let  f  S  F  be  an  n-tuple  of  behavioural  strategies  for 
the  discounted  supergame  G*.  We  say  that  f  is  a  Nash  Equilibrium  iff, 
for  each  player  i,  and  each  behavioral  strategy  f!  6  F . ,  we  have 

where?,.,  denotes  the  n-1  tuple  (f", f i-l'i+1' *  * ' 'fn^  °f  behavioral 

strategies  used  by  the  other  players.   f  is  a  perfect  equilibrium  iff, 
for  every  t,  and  for  every  member  m'  of  [M]  ,  the  "continuation  strategy" 
T'(  :m')  defined  by: 

f!  (m  ,...,m   :   m')  =  f.   (m  ,m  , . . . ,m   ) 

is  an  equilibrium  in  G*.  As  a  further  matter  of  notation,  we  shall  de- 
note  the  set  of  members  of  CH(h(S))  that  can  be  achieved  as  H  (f)  for 
some  Nash  Equilibrium  f  by  e.p. ,  while  the  subset  of  e.p.  that  can  be 
achieved  by  perfect  equilibria  will  be  denoted  p. e.p. 
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III.   The  Set  of  Attainable  Outcomes 

This  section  concerns  the  observation  that,  for  sufficiently  small 
values  of  the  individual  discount  rates,  it  may  happen  that  not  all  mem- 
bers of  CH(h(S))  can  be  achieved  in  behavioral  strategies,  let  alone  in 
pure  strategies.  This  is  in  sharp  contrast  to  the  situation  for  the 
undiscounted  game,  where  any  point  in  CH(h(S))  can  be  achieved  in  pure 
strategies,  by  playing  the  relevant  pure  strategy  n-tuples  of  the  stage 
game  with  frequencies  that  correspond  to  the  weights  used  in  the  convex 
combinations  forming  CH(h(S))  . 

For  simplicity,  we  work  with  the  case  where  &.    =   6,    for  all  i.   In 
any  game  [N,S,h],  we  can  isolate  three  subsets  of  CH(h(S)),  corresponding 
to  the  outcomes  that  can  be  achieved  using  pure,  mixed  and  correlated 
strategies  in  each  stage.  Lett  C  =  CH(S)  be  the  set  of  correlated  strat- 
egies for  the  stage  game,  we  define 

3.1  Definition:   D  =  {x  e  CH(h(S))  s.t.  there  exists  an  infinite  sequence 
P 

(s  ,...,s  ,...)  of  members  of  S  with  the  property  that 

00 

x  =  (1  -  6)  Z  fi^hCs')} 
t=l 

5  It 

D     =   {x   e  CH(h(S))    s.t.    there   exists   an   infinite   sequence    (m    ,...,m    ,...) 
m 

of  members  of  M  with  the  property  that 

00 

x  =    (1   -   6)    Z    St"'lE(,mt) 

t=l 

D  =  {x  e  CH(h(S))  s.t.  there  exists  an  infinite  sequence 
c 


(c  ,...,c  ,...)  of  members  of  C  with  the  property  that 


x  =  (1  -  6)  Z  6t"1H(ct) 

t=l 
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Clearly,  since  H(C)  =  CH(h(S)),  we  have  D  =  CH(h(S))  for  all  6.  More- 
over, in  cases  where  H(M)  =  CH(h(S)),  as  with  Prisoner's  Dilemma,  we 

have  D  =  CH(h(S)).   In  general,  the  definition  of  the  set  of  attainable 

m 

points  will  hinge  on  the  set  of  weights  that  can  be  obtained  through  the 
use  of  the  relevant  strategies.   For  example,  if  the  discount  rate  is 
0.1,  the  first  pure  (or  mixed)  payoff  will  have  a  weight  of  0.9;  this 
means  that  the  eventual  discounted  sum  must  be  close  to  one  of  the  orig- 
inal pure  or  mixed  strategy  payoffs  for  the  stage  game.   To  see  what  this 

means  in  terms  of  D  ,  let  us  observe  that  the  weight  given  to  a  particular 

P 

pure-strategy  outcome  must  be  of   the   form 

CO 

(1   -   6)    E   a   5t-1 
t=l 

where  a  =   (a..  , . . .  ,a    ,.  . .)    is   some  infinite   sequence  of   0's   and  l's.      If 

we  are   taking   convex  combinations   of  m  such  pure-strategy  outcomes,    we 

need  a   characterisation  of   the   feasible   convex  combinations. 


3.2     Definition:      Let   A   e    (0,1)  ,   and  let   A     be   the   standard  m-simplex. 
Define 


A j.   =   {A  €   A    :      there  exist   sequences  a.,..., a      s.t. 

i)        a.    =    (a. , . . . ,a., . „ .) ,   a.        {0,1}   each  i,t 
1111 

m  t 
ii)    for  all  t,  Z   a.  =  1,  and 

i=l  X 

00 

iii)    A.  =  (1  -  6)  I  at6t"1} 

t=l  i 

This  is  the  set  of  weights  on  pure-strategy  payoffs  available  via  the  use 

of  pure  strategies  in  the  discounted  supergame.   We  have  the  following 

result. 
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lt is  clear  that  for  all   t,    i,    X.   _>  0 ,   and   that   for  all   i, 
lim  X.   =  0.      Moreover,    from  the  construction   it    follows    that   for   all   T, 


t-w» 


A0   .   XJ+1  +  (1   -   6)    Z  aV1 
1  X  t-1   " 

which  completes  the  proof.  QED 


When  we   turn  to  mixed  strategies,   we  should  expect   some   relaxation  of 
this   condition.      Indeed,    in  many  cases  we  can  achieve   the  entire  set  of 
outcomes.      We   shall  not   continue  with  our   characterisation  of    the   set 
of   attainable  outcomes,    since   there   is   insufficient   generality   to  war- 
rant  it.      However,    we   shall  observe   that   there   is  a  natural   upper   bound 
for  i  that  in  most  cases   is   less   than  the  number  of   pure-strategy  com- 
binations.     CH(h(S))  C   Rn,    so    that   we  need  use  no   more   than  N+l  pure- 
strategy  combinations.      On  the  other  hand,    if  N  >_  2   and    |S.  |    >_  2   for 
each  i,    then    |s|    >  N+l  so  we  obtain: 

Theorem   2.^:      If   6    >_ 1   -  — ,    then   the   set   of   outcomes  obtainable  via  pure 
strategies  coincides  with  CH(h(S)). 

We  close   this   section  with  an   example  of   a  well-known  game  where 

D  t  D6   t   D°:   the  Battle  of  the  Sexes.   This  is  a  two-player  game, 
p    m    c       

where  each  player  has  two  pure  strategies: 


L 

R 

T 

(2,1) 

(0,0) 

B 

(0,0) 

(1,2) 

To  begin  with,  the  following  three  figures  show  the  stage-game  payoffs 
to  pure,  mixed  and  correlated  strategies,  respectively. 


m 
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3.3  Proposition:   6  >  m  "  1,  iff  A™  =  Am. 

*  —   m         o 

Proof:   if  a.  =  1,  then  A.  >  (1  -  6),  so  a  necessary  condition  for  the 

conclusion  of  the  Proposition  is  that  max  X .  >_  (1  -  6)  for  every  X  G  A 

But  X  S  A  implies  that  max  X.  >  — ,  and  this  bound  is  tight,  so  we  know 

l  —  m 

i 

the  necessary  condition  is  only  satisfied   if 

1        i  <•  t-        m  -   1 

—  >  1  -   6;   or   6   >  

m  —  —       m 

It  remains  to  be  shown  that  no  further  restrictions  result  from  the  choice 

of  subsequent  weights.   To  do  this,  we  shall  exhibit  a  procedure  by  which 

the  a.  can  be  calculated  explicitly.   To  begin  with,  fix  X  S  A  .   Let 

in  e  argmax  X..  We  know  that  X.   >  — ,  so  that  by  hypothesis,  X.   >  1  -  6. 
l.i  1,  —  m  l,  — 

i  1  1 

Therefore,  let  us  set  a.   =1,  and  a.  =  0  for  all  i  p  i.. ,  and  form  a  new 

vector  X  by  X.  =  X.  for  j  r  i,  ,  and  X.   =  X.   -  (1  -  6)  .   This  new  vector 

belongs  to 

m 
A°  =  {X  G  R°:    Z  X.  =  1  -  (1  -  6)  =  6} 

1         3-1  J 

1       X 

Once  again,  we  know  that  max  X .  >  — .   The  condition  for  us  to  be  able  to 

l  —  m 

2  X  1 

choose   a     appropriately   is    that  max  X.    >_  (1   -   6)6,    so    that    this  condition 

i 

is  satisfied  if 

6   /,    <.\  ,.     .   m  —  1 

—  >  (1  -  6)6;  or  6  >  

m  —  —   m 

so  that  successive  levels  of  choice  of  a   introduce  no  new  conditions. 

We  have  now  demonstrated  the  truth  of  the  Proposition,  since  the  procedure 

begun  above  can  be  iterated  by  choosing  i  €  arg  max  X.   ,  setting 

aC   =1  =  1-  aC,  all  i  +■   i  ,  and  letting  X.    =  X.  for  j  j=   i  ,  while 
i  2  t  J      J  t 

,t+l    ,t     fT,n    x. 
X.    =a.   «=  o  (1  -  6) . 
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\  • 


pure 


mixed 


correlated 


To  construct  the  set  of  payoffs  obtainable  via  pure  strategies,  given  a 
small  discount  rate,  we  first  add  a  shrunken  replica  of  the  convex  hull 
of  the  original  payoffs  near  each  of  the  pure-strategy  payoffs.   The 
final  payoff  must  lie  within  one  of  these  convex  hulls;  which  one  is 
determined  by  the  choice  of  the  first-period  strategy  combination.  Next 
to  this  we  have  repeated  the  process  within  each  of  the  nex  convex  hulls, 
adding  shrunken  replicas  to  the  vertices  representing  the  pure  choices 
at  the  first  and  second  stages.   The  process  continues  inductively,  re- 
sulting in  a  sparse  nondenumerable  set  of  payoffs. 


i> 


> 


Stage  1 


> 

> 

A 

Stage   2 

Stage   3 
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To  construct  D  ,  the  process  is  only  slightly  more  complicated.   For 
each  point  in  the  set  of  payoffs  obtainable  via  mixed  strategies,  there 
will  be  a  shrunken  replica  of  that  set,  and  we  must  take  the  envelope 
of  these  attached  replicas.   In  the  following  figures,  we  show  a  few  of 
these  attached  replicas,  and  the  envelope  of  those  replicas. 


First  stage;   some  replicas 


the  envelope 


The  next  step  is  to  repeat  the  process  for  each  point  added  at  the 
second  step:   this  means  that  we  must  remain  within  the  envelope  of  the 
convex  hulls  of  the  shrunken  replicas  added  at  the  first  stage:   this 
envelope,  shown  in  the  left-hand  figure  below,  represents  the  maximum 
area  we  can  hope  for.   In  the  figure  on  the  right,  we  have  illustrated 


the  result  of  this  second  step, 


The  outer  limit 


the  second-stage  envelope 
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XXX 

These  pictures  make  it  clear  that  D  t   D  ^  D  . 
r  p    m    c 

IV.   Equilibrium  in  the  discounted  supergame 

Any  n-tuple  of  supergame  strategies  can  be  divided  into  two  parts: 
the  specified  sequence  s(f)  that  results  from  adherence  to  the  strategies 
f ,  and  various  contingent  sequences  resulting  from  deviations  from  the 
specified  sequence.   Since  each  player  knows  the  strategy  descriptions 
of  the  other  players  in  the  equilibrium,  each  player  can  predict  the 
future  course  of  play  for  any  choice  of  his  own  actions. 

One  immediate  consequence  is  that  any  n-tuple  of  supergame  strat- 
egies whose  specified  sequence  consists  of  equilibria  of  the  stage  game, 
and  which  makes  the  same  prescription  for  any  history  is  an  equilibrium. 
Thus,  any  sequence  consisting  of  members  of  the  set  of  stage-game  equi- 
libria can  be  sustained  as  the  outcome  of  such  an  "open  loop"  equilibrium 
for  any  monotonic  evaluation  relation. 

In  general  we  shall  be  concerned  with  outcomes  that  cannot  be 
achieved  in  this  manner,  so  that  we  shall  need  to  consider  the  concept 
of  punishment.   In  general,  the  worst  punishment  that  can  be  inflicted 
on  a  player  in  any  single  play  of  the  game  is  that  which  holds  him  to 
his  minmax  security  level.   There  are  two  reasons  for  using  this  security 
level  rather  than  the  (lower)  maxmin  level.   The  first  is  that  the  pun- 
ishment to  be  used  against  a  defector  forms  part  of  the  declared  strat- 
egies of  the  other  players,  so  that  the  defecting  player  can  adapt  his 
"defense"  to  the  specific  punishment.   The  second  is  that  in  this  game 
all  lotteries  are  public,  so  that  there  is  no  way  that  the  other  players 
can  use  a  correlated  punishment  against  the  defector.   Of  course,  if  they 
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were  able  to  use  correlated  punishments,  there  would  be  no  difference 
between  the  minmax  and  maxmin  levels;  both  would  co-incide  with  the  value 
of  the  two-person  zero-sum  game  played  between  the  defector  (the  maxi- 
mising player)  and  the  others  (the  minmising  player),  over  the  defector's 
payoffs. 

From  this  observation,  it  follows  that  the  strongest  punishment 
that  can  be  inflicted  on  a  defector  in  the  supergame  is  to  hold  him  to 
his  minmax  level  in  all  plays  following  the  detection  of  a  deviation. 
Such  a  punishment  is  called  a  grim  punishment.  We  observe  that  a  player 
can  be  deterred  from  defecting  from  a  particular  specified  sequence  if 
and  only  if  the  threat  of  grim  punishment  is  sufficient  to  deter  him. 

4.1  Definition:  Let  [N,M,H]  be  the  mixed  extension  of  a  normal-form 
game.   For  each  i  €  N,  define  p  £  M,  as  follows: 

p,.s  £  arg  min  [max  H.  (m.  ,m, .,. )] 
(1)      &  1   x'  (l) 

m  . . .   m . 
(i)    i 

p±  G  arg  max  H^m^p^) 

m. 

i 

This  is  the  minmax  punishment  and  defense  to  be  used  when  player  i  de- 
fects. Let  v.  =  H.(p  )  be  player  i's  minmax  security  level. 

— 1     — t 
Let  [m  , . . . ,m  ,...]  be  an  infinite  sequence  of  members  of  M.   The  grim 

strategy  supporting  m  =  [m  ,...]  is  the  n-tuple  f  of  supergame  strategies 

defined  by 

f .  =  m.     for  all  i,  and 
l    l 
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'W   iff  there  exists  T  <  t  s.t. 

t'   — t' 
i)  m   =  m   for  all  t'  <  T 

,t,   1      t-lv    J  .  .s    T   -T   -    „  .  ,  . 

f .  (m  ,  . . .  ,m   )  =  c  11)   m,  =  m,    for  all  kf  j 

T    T 
iii)  m.  ^  m. 

\m.   if  not 

In  other  words,  the  grim  stategy  plays  according  to  the  cooperative  se- 
quence until  the  first  date  when  a  defection  occurs.   If  a  single  player 
is  responsible  for  that  defection,  then  that  player  is  punished  forever. 

In  the  undiscounted  game,  a  player  contemplating  defection  from  a 
grim  strategy  supporting  a  specified  sequence  m  has  the  choice  of  two 

outcomes:   lim  H.(m  )  if  he  does  not  defect,  and  v.  if  he  does.   If  the 
i  1 

t-*=° 

first  limit  exists,  i.e.,  if  the  sequence  H.(m  )  is  Caesaro-convergent, 
player  i  will  adhere  to  the  grim  strategy  iff  the  first  of  these  numbers 
exceeds  the  second.   This  is  the  "first  Folk  Theorem"  of  the  undiscounted 
supergame: 

4.2   Theorem:   the  set  of  limiting-average  payoffs  to  equilibria  of  the 
undiscounted  supergame  is  {ye  CH(h(S)):   y.  ^_v.,  all  I  €  N}. 

One  striking  feature  of  this  result  is  that  the  outcomes  can  be 
characterised  purely  by  their  payoffs;  no  strategic  considerations  enter 
in.   In  particular,  the  immediate  profit  earned  by  the  defector  plays  no 
role.   Unfortunately,  this  is  not  true  in  the  discounted  game,  so  that 
the  characterisation  of  equilibrium  outcomes  involves  explicitly  the 
strategic  aspects  of  the  specified  sequence.   However,  it  is  still  the 
case  that  the  sine  qua  non  of  equilibrium  is  the  existence  of  a  grim- 
strategy  equilibrium  supporting  the  outcome,  so  that  we  obtain: 
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4.3  Theorem:   y  £  D  is  the  outcome  of  an  equilibrium  of  the  discounted 

m 

— 1     — t        — 
supergame  iff  there  exists  an  infinite  sequence  (m  , . . .  ,m  , ...)  =  m  of 

members  of  M  with  the  following  properties: 

ao 

i)         for  all   i,    y     =    (1  -   6±)    Z   <5t~1Hi(mt) 

t=l   1 

ii)        for  all   i,T 

(1)  Z    sTmTR±CSh    >.max  H^nu/m*..)    +   (6^(1   -   &±))v± 

t>T  1  m. 

—  l 


or 


(2) 


_  T-l 

y.    >    (1   -   6.) [max  H.(m.,m*      )   +     Z   6t~TH.(mt)]    +  6. v. 
m.  t=l 


In  particular,    if  y  €  H(M) ,   y  is   the  outcome  of  a   stationary  equilibrium 
of   the  discounted  supergame  iff   there  exists  m*  £  M  s.t. 

iii)    H(m*)   =  y 

iv)    for  all    i  £   N 

(3)  y±  1  (1   -    6.)max  Hi(m.,m^i))    +   &±v±. 


ffii 


Proof: 


Stationary  Equilibrium:  Let  us  suppose  that  player  i  wishes  to  defect 
from  a  stationary  grim  strategy  supporting  the  sequence  m*,m*,....  If 
he  does  not  defect,  his  payoff  will  be  y.;  if  he  does  defect,  his  pay- 
off will   be  at  most  max  H.(m.,m*.«)    in   the   first   period  and   at  most   v± 

m. 
in  all   subsequent   periods.      The  normalised   discounted   payoff    to    his 

best   defection   is   therefore    the  RES  of   condition    (3) .      This   shows    the 

sufficiency  of    the  condition.      Necessity  follows    from   the   following 
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observation:   in  any  supergame  strategy  combination  with  stationary 
specified  sequence  m*,m*,...,  the  payoff  to  the  best  defection  will  be 
greater  than  or  equal  to  the  payoff  to  the  best  defection  against  the 
grim  strategy  supporting  this  outcome. 

In  general,  this  will  be  true:   if  m  is  any  infinite  sequence,  and 
f  is  any  supergame  strategy  supporting  this  sequence  (i.e.,  m  (f)  =  m  ), 
then 

maxHj(f',f(.))1maxH5i(f1,g(i)) 
i  i 

where  g  is  the  grim  strategy  supporting  m. 

Nonstationary  equilibrium:   In  general,  it  may  be  the  case  that  D 
strictly  contains  H(M) .   If  player  i  chooses  to  defect  from  a  grim 
strategy  g  supporting  the  sequence  m  at  time  T,  he  exchanges  an  expected 
payoff  sequence  worth 

Z    6t_TH.(mt) 
t>T  X   1 

— T 
for  one  which  pays  at  most  max  H.(m.,m  )  on  day  T,  and  v.  in  all  subse- 

m. 

quent  periods,  for  a  total  expected  payoff  of 

max  H.(m.,m,.N)  +  6.v./(l  -  <5.) 

x   1   (l)       11         1 

m. 

i 

as  of  day  T,  which  give  us  the  LHS  and  the  RHS  of  (1),  respectively. 
Finally,  we  can  obtain  condition  (2)  by  applying  condition  (i)  to 
equation  (1).  QED 

We  shall  present  an  example  which  uses  this  theorem  to  characterise 
the  equilibrium  points  of  Prisoner's  Dilemma.   Before  we  do  so,  there 
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are  several  consequences  of  this  theorem  that  are  worth  noting.   In  the 
first  place,  by  letting  all  the  discount  rates  go  to  1  we  obtain  pre- 
cisely the  "first  Folk  Theorem":   any  feasible  and  individually-rational 
payoff  can  be  supported  by  an  equilibrium  of  the  undiscounted  game. 
Strictly-speaking  this  gives  us  a  version  of  the  Folk  Theorem  where 
players  use  the  Abel  limit,  rather  than  the  first  Caesaro  sum  to  eval- 
uate payoff  streams.  However,  this  poses  no  problems,  since  Caesaro 
convergence  implies  Abel  convergence. 

Another  interesting  feature  of  this  result  can  be  noted  by  letting 
the  discount  rate  shrink  to  0,  condition  (3)  becomes  the  usual  condition 
for  Nash  Equilibrium,  while  condition  (1)  limits  us  to  precisely  those 
sequences  with  which  we  began  this  section;  sequences  calling  for  a 
stage-game  Nash  equilibrium  at  every  stage. 

Finally,  it  will  be  noted  that  condition  (1)  can  be  written: 

(4)       min[  S  6.   H.(m  )  -  max  H.  (m.  .m, .,.  ]  >  6.v./(l  -  <5 . ) 

T  t>T  X       X  m.   X  X     (1)     ±   i 

—  i 

and  it  is  clear  that  this  is  monotonic  in  6 . :   if  y  is  an  equilibrium 
outcome  at  6  =  (6-,...,  &   )  and  6!  >_  6 .  for  each  i  €  N,  then  y  is  an 
equilibrium  outcome  at  <S'. 

We  conclude  this  section  by  characterising  the  outcomes  of  equi- 
libria of  a  simple  version  of  the  Prisoner's  Dilemma.   The  mixed  exten- 
sion of  this  game  is  an  follows:   let  m.  be  the  probability  that  player 
i  uses  the  "Greedy"  Strategy,  and  1  -  m.  be  the  probability  that  player 
i  uses  the  "Helpful"  strategy.   The  payoff  functions  are: 

H  (m..  ,m9)  =  3  +  nu  -  3m? 
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I2(nL,,m2)  =  3  +  m2  -  3m1 


It  follows  that,  for  any  pair  (m  ,m„) ,  the  best  defection  is  m.  =  1, 
so  that 


max  Hx (m1,m2)  =  4  -  3m2; 

m. 

l 

vx  =  1  =  v2 


max  H?(m  ,m„)  =  4  -  3m..  ;  and 
m2 


Thus,  we  can  determine  that  (m  ,m„)  is  the  outcome  of  a  stationary 
supergame  equilibrium  iff: 


(5) 


3  +  n^  -  3m2  >_  (1  -  6  )  (4  -  3m2>  +  5  ;  and 


(6) 


3  +  m2  -  3^  >.  (1  -  62)  (4  -  3^) 


+  &, 


We  can  rearrange  these  linear   inequalities   to  give  a  unified  condition 
on  m: 

1  -  m„ 


(7) 


1   - 


l6T-ml  -1  "  361(1  "m2) 


In  the  following  figure,  we  show  the  image  of  this  set  of  strategies, 
for  discount  rates  6 .  between  1  and  tt. 


i^ 


1   I[ 
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We  have  labelled  the  boundaries  of  this  region,  to  facilitate  translating 
these  strategy  pairs  into  pairs  of  payoffs.  In  region  I,  we  have  m„  =  0, 
m.  <_  1  -  TT~;  in  Region  II,  we  have  m.  =  0,  and  m„  <_  — r~ ;  in  region  III, 

we  have  nu  =  1  -  tt— (1  -  nO  ;  while  in  region  IV,  we  have  m-  =  1  -  3<52(l-m..) 

—  1 

In  region  III,  m„  ranges  between  1  -  -rr—  and  1,  and  in  region  IV  between 

0  and  1.   Inserting  these  boundary  values  into  the  payoff  function,  we 

get  the  corresponding  regions  in  payoff  space: 


I:   H  =  3  +  ny  3  -  3mi;  n^  6  [0,1  -  —•] 


II: 


H  = 


III:   H  = 


3  -  3m2,  3  +  m2;  m2  e  [0,1  -  — ] 


IV: 


H  =  [(n^  +  952(1  -  n^)),  (A  -  3  0^  +  S^l   -  n^))  ] 


We  have  displayed  these  regions  below. 


Turning  now  to  the  non-stationary  equilibrium  outcomes,  we  observe 
that  since  any  outcome  in  D   is  also  in  H(M)  =  CH(h(S)),  we  have  only 


to  see  whether  there  are  any  outcomes  which  are  more  stable  when  the 
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specified  sequence  is  nonstationary .   The  crucial  element  in  this  is  the 
incentive  to  defect  at  any  stage.   Suppose  that  we  are  trying  to  support 
an  outcome  paying  (y  ,y~):   the  stationary  strategies  giving  this  outcome 
specify  a  repetition  of  (m*,m*)  where 

m*  -  |  -  |[y.  +  3y  ]         i  =  1,2;  j  =  2,1,  j  t   i 

On  the  other  hand,  if  m  is  an  infinite  sequence  with  the  same  payoff,  and 
if  m  is  the  outcome  of  an  equilibrium  of  the  stage  game,  we  have  four 
equations  to  satisfy. 

(8)  I   m  6  -3  I  m„6,    =  -, —  (y  -  3) 

t=l  L  1      t=l  2  1  1-^1 

(9)  Z  m^"1  -  3  Z  m.^"1  -  ]    1   ;  (y,  -  3) 
t=l  2  2      t=l  X  2     1  "  «2  2 

(10)  min[  Z  m^"1  -  3  Z  m^6^"T]  >  ■= r-^- 

T   t>T  X  l  t>T  2  1  ~  1  "  °1 

l  .-  *^^ 
t  t-T         t  t-T  2 

(11)  min[  Z  m,«,   -  3  Z  ra.c5^  L]    >   - ~ 

T  t >T  2  1      t>T  1   2  -1  ~   &2 

The  first  two  being  feasibility  conditions,  and  the  last  two  being  equi- 
librium conditions  derived  from  conditions  i)  and  ii)  of  Theorem  4.3. 
Since  we  are  interested  in  the  set  of  outcomes,  and  not  the  sequences 
that  give  rise  to  them,  we  may  assume  that  there  is  no  stationary  equi- 
librium paying  (y..  ,y.) .   Thus  either 

4(1  -  36]_)  -  3(1  -  S  )y2 

(12)  Yl  >  (1  _  95  ] or 

4(1  -  36J  -  3(1  -  50)y 

(13)  y. 


(1  -  962) 
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In  terms  of  the  strategies  used  by  the  players,  from  the  definition  of 
m*  we  know  that 

(14)       Z  CC3  +  m5  -  3m$]  -  [3  +  m*  -  3m*])6.t~1  =  0  = 
t=l       L  l  12   1 

I  ([3  +  nu  -  3m1t]  -  [3  +  m*  -  3m*])6^_1 
t=l        ^      X  /      1    2 

Let  us  suppose  that  condition  (12)  is  satisfied;  it  is  player  1  who  will 
defect  from  the  stationary  equilibrium.   This  means  that 


(15) 


3  +  m*  -  3m*  <  (1  -  6^(4  -  3m*)  +  &±; 


or 


(16)      m*  <  1  -  36  +  36  m* 

Condition  (14)  can  be  rearranged  (using  only  the  left-hand  equation)  to 
give: 


m*  -  3m* 

(17)       E  m.6.    -31  m„6  ~  =  -: - — 

t=l  l   L      t=l  2  X      X  ~   61 


Inserting  (16)  into  this  gives 


"I         Tf 

(18)       S  nW_1  -  3  Z  mjfi?"1  <  -, —i  -  3m* 

t=l  l   1  t=l  2  X     X  "  61      2 


which  contradicts  condition  (10).  Thus,  any  equilibrium  outcome  in  the 
prisoner's  dilemma  can  be  supported  by  a  stationary  equilibrium,  so  the 
sets  defined  above  provide  the  entire  set  of  equilibrium  outcomes. 

V.    Perfect  Equilibrium  in  the  discounted  supergame 

The  grim  strategies  usually  fail  to  provide  perfect  equilibrium  out- 
comes, as  the  following  simple  example  shows:   Let  the  mixed  extension 
of  a  game  with  two  players  be  given  by: 
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H(m  ,m  )  =  (m  ,2m  m2  -  m_  +  m.  -  1) 

The  grim  strategy  secures  an  equilibrium  of  the  game  with  the  stationary 
result  m*  =  1,  m*  =  0,  as  long  as  the  discount  rate  for  player  2  is  at 
least  — ,  and  certainly  for  the  undiscounted  game.   However,  once  player 
2  has  defected,  it  is  incredible  that  player  1,  who  is  in  no  way  injured 
by  player  2's  defection,  should  actually  carry  out  the  grim  punishment 
which  costs  him  his  entire  remaining  profit  in  the  game. 

In  the  undiscounted  game,  the  requirement  of  perfection  does  not 
actually  affect  the  set  of  payoffs  sustained  by  equilibrium  behavior,  a 
result  discovered  independently  by  Rubens tein  and  by  Aumann  and  Shapley. 
We  shall  call  this  the  "perfectness  Folk  Theorem"  and  include  a  simple 
proof  for  the  present  model. 

5.1  Theorem:  The  set  of  limiting  average  payoffs  to  perfect  equilibria 
of  the  undiscounted  supergame  coincides  with  the  set  of  payoffs  to  equi- 
libria of  the  undiscounted  supergame. 

Proof:   Let  y  e  CH(h(S))  be  a  feasible  and  individually-rational  payoff: 

v.  >  v.  for  each  player  i.   We  know  by  Theorem  4.2  that  there  exists  an 
'l—i 

equilibrium  of  the  undiscounted  game  with  limiting  average  payoff  ex- 
actly y.  Let  us  denote  the  specified  sequence  of  this  equilibrium  by 
m(y)  =  [m  (y) , . . . ,m  (y) , . . . ] .  By  the  properties  of  the  Caesaro  mean, 
for  any  finite  T,  we  have 

(19)      lim  i  E  H(mT(y))  =  y 
t-*»        T=T 
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Thus,  nothing  that  happens  in  finite  time  affects  the  limiting  average 
payoff.  Now,  let  f  be  any  n-tuple  of  supergame  strategies,  and  m  ,...,m 
an  arbitrary  history  of  length  t.  We  define  the  set  of  last  defectors 
from  f  according  to  m  =  (m  , . . .,m  ) ;  LD(f ,m) ,  and  the  time  of  last 
defection  t*(f,m)  as  follows: 

t'    t1   1      t'-l 
fmax  {t'<_t:  m  £  f     (m,...,m    ) }  if  it  exists 

t*(f,m)  »< 

v__t  +  1  otherwise 


(is  N:   mt*(f,m)  f   f t*(m1, . . . ,mt*"1) }  if  t*  =  t*(f,m)  <  t 


LD(f,m)  =< 

lo  otherwise 


Now  let  e  be  an  infinite  sequence  of  positive  numbers  e  =  o(t).  We 
shall  define  the  notion  of  a  "debt  to  society,"  by  stipulating  that  a 
player  who  defects  at  time  t  is  to  be  punished  until  his  cumulative  aver- 
age payoff  is  within  e  of  his  minmax  payoff,  at  which  point  play  re- 
turns to  the  specified  sequence,  or  until  another  player  or  players  de- 
fects.  If  another  single  player  j  defects  at  time  t'  subsequent  to  t, 
then  j  is  to  be  punished  to  within  e  of  his  minmax  payoff;  if  more 
than  one  player  defects  simultaneously,  play  returns  to  the  cooperative 
sequence.   To  implement  this  idea,  we  must  remove  from  LD(f,m)  those 
players  who  have  paid  their  debt  to  society.   The  remaining  criminals 
as  of  time  t  form  a  set  C  (f,m)  defined  by: 

Ct(f,m)  ={i£  LD(f,m):  £  Z   H.(mT)  >  v.  +  e^(f   m)  } 

T<t 
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We  can  now  define  a  perfect  equilibrium  strategy  with  outcome  y: 
FT  =  m.(y)   for  all  i 


f . (m  ,. . o ,m   )  =  f . (m)  =  < 


pJ   iff  Ct(f,m)  =  {j} 


m.   otherwise 

1 


The  following  are  consequences  of  this  definition:   m(f)  =  m(y)  ,  so  that 
the  specified  sequence  does  indeed  have  the  outcome  y.   If  player  i  con- 
templates defecting  for  at  most  a  finite  number  of  periods,  his  limiting 
average  payoff  will  be  y.  by  equation  (19)  and  the  fact  that  C  (f,m) 
is  always  empty  a  finite  number  of  periods  after  any  last  defection,  if 
the  players  adhere  to  f :   in  other  words,  by  adding  at  each  stage  the 
amount  v.  to  player  i's  cumulative  payoff,  his  cumulative  average  payoff 

reaches  the  trigger  level  e  .,..  N  +  v.  within  finite  time  after  t*(f ,m) . 
00  t*(f,m)     l 

If  player  i  defects  an  infinite  number  of  times,  then  the  strategy  calls 

for  him  to  be  punished  forever,  since  the  trigger  level  approaches  v.. 

Now,  consider  any  subgame  in  which  players  are  punishing  player  i.   If 

player  j  decides  to  defect  by  not  playing  p.,  then  player  j  is  punished. 

If  i  defects  forever,  he  is  held  to  v.:  if  he  defects  a  finite  number  of 

3 

times,   play   returns   to  m(y)    and  his  payoff  is  y.;    if  he  does  not  defect, 
the  punishment  of   player   i   ends   in   finite   time,    so   his  payoff   is   y.. 

Therefore,    any   feasible   and   individually-rational   y  can  be  sustained 
as  a   perfect   equilibrium  outcome  by   such  a   strategy.      Since  every   perfect 
equilibrium  outcome   is   a    fortiriori   an   equilibrium  outcome,    it   follows 
that   the   set  of   perfect   equilibrium  outcomes   co-incides  with  the   set   of 
equilibrium  outcomes.  QED 
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This  happy  state  of  affairs  does  not  persist  in  the  discounted  game, 
as  we  can  show  through  analysis  of  a  simple  example  related  to  the  prob- 
lem of  strategic  control  of  externalities. 

5.2  Example;  There  are  two  players.   In  the  stage  game,  player  1  can 
take  a  level  of  precaution  s-  £  [0,1].   It  costs  him  nothing  to  take 
this  precaution,  but  the  result  is  a  social  cost  of  C(l  -  s  )  where  C 
is  a  large  positive  number.   Player  2  cannot  take  any  precaution,  but 
can  compensate  player  1  by  paying  him  an  amount  s„  from  her  initial 
wealth  of  1.   Player  1  is  liable  for  a  constant  share,  L  S  (0,1),  of  the 
social  cost,  and  player  2  pays  the  balance.   The  payoffs  to  the  two 
players  are  therefore: 


h 


1(x1,s2)  =  s2  -  L(l  -  s1)C 


h2(slfs2)  =  1  -  s2  -  (1  -  L)(l  -  Sl)C 

In  the  one  shot  game  there  is  a  unique  equilibrium  at  s.  =  1  =  1  -  s„. 
In  the  undiscounted  supergame,  we  can  define  the  set  of  strong  equilibrium 
outcomes  to  be  the  set  of  Pareto  Optimal  equilibrium  outcomes;  in  general, 
a  strong  equilibrium  is  a  situation  from  which  no  coalition  can  defect, 
making  all  of  its  members  better  off.   Here,  the  only  non-singleton 
coalition  is  the  pair  {1,2}.   A  strong  equilibrium  outcome  of  the  undis- 
counted supergame  is  a  pair  of  net  wealths  (a..  ,a„)  s.t. 

i)    a  +  a„  =  1   (Pareto  Optimality) 

ii)    a   >_  0   (individual  rationality  for  player  1) 

iii)    a„  >  1  -  (1  -  L)C   (individual  rationality  for  player  2) 
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The  set  of  equilibrium  outcomes  can  be  found  by  replacing  condition  i)  by 

i')   a.  +  a2  <_  1   (feasibility) 

It  is  clear  that  player  1  can  use  the  threat  of  diminishing  his  precau- 
tion to  extract  some  money  from  player  2.  Moreover,  by  Theorem  5.1  we 
know  that  this  threat  is  credible  in  the  undiscounted  game,  so  condi- 
tions i-iii  also  give  us  the  set  of  strong  perfect  equilibrium  outcomes 
for  the  undiscounted  game.   Now  let  us  move  to  the  discounted  game,  with 
both  players  using  the  same  discount  rate,  d  G  (0,1).   From  Theorem  4.3, 
we  know  that  (a..  ,a„)  is  a  Pareto  Optimal  outcome  of  an  equilibrium 
of  the  discounted  game  iff  it  satisfies  i) ,  ii) ,  and 

iv)    1  -  a2  =  a1  >_  d(l  -  L)C 

Now  suppose  that  player  1  wishes  to  punish  player  2  for  some  defection 
by  playing  the  punishment  sequence  (s  , . . . ,s  , . . .)  .   The  ratio  of  the 
cost  to  player  1  of  this  sequence  divided  by  the  punishment  inflicted  on 
player  2  is: 

00 

Z   dt_1[(l  -   s5)LC] 
t=l  1  L 


"     t-1  t  X   "  L 

z  dc  1[(i  -  sba  -  Dc] 
t=i  L 

So  that  player  1  may  as  well  react  immediately  to  defection  with  a  pun- 
ishment sufficient  to  have  deterred  defection  in  the  first  place.   How 
strong  must  this  punishment  be? 

If  the  outcome  of  the  perfect  equilibrium  is  to  be  the  pair  (a,l-a) , 
it  is  easy  to  see  that  the  cheapest  punishment  sequence  sufficient  to 
prevent  defection  is  (s  ,1,1,...)  where 
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(20) 


(1  -  Sl)(l  -  L)C  >| 


We  must  now  see  whether  player  1  will  be  willing  to  execute  this  punish- 
ment. By  the  argument  given  above,  player  1  has  two  alternatives,  use 
the  specified  punishment  at  once,  or  hold  off  forever,  for  a  payoff  of  0 
in  each  period.   The  condition  for  carrying  out  the  punishment  is  there- 
fore 


(21)      (l-^LC^-1-5! 


Combining  this  with  (20)  gives  us  the  condition  for  the  equilibrium  out- 
come (a,l-a)  to  be  sustainable  as  the  outcome  of  a  stationary  Pareto 
Optimal  perfect  equilibrium  of  the  discounted  supergame: 

,2 


(22) 


da 


(1  -  d)LC  -  (1  -  L)dC    1-d-l-L 


'   ,  d  :  >  ;  L  :   for  a  t   0,  L  €  (0,1), 


d  e   (0,1) 
Combining  this  condition  with  i,  ii,  and  iv  gives  us  the  set  of  (strong) 
perfect  equilibrium  outcomes  of  the  discounted  supergame.   The  set  of 
equilibrium  outcomes  is  as  shown  below: 


l-d(l-L)C- 
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The  set  of  perfect  equilibrium  outcomes  is  equal  to  the  set  of  equilibrium 
outcomes  if  condition  (22)  is  satisfied,  and  is  equal  to  the  single  out- 
come (0,1),  otherwise.  We  should  remark,  that  an  argument  similar  to  that 
used  in  analysing  the  set  of  equilibrium  outcomes  for  the  prisoner's 
dilemma  lets  us  confine  our  attention  to  stationary  outcomes  of  equilibria 
in  this  game. 

Since  there  is  no  hope  for  a  general  result  such  as  Theorem  5.1  for 
the  discounted  case,  we  close  by  presenting  a  sufficient  condition  for 
the  set  of  perfect  equilibrium  outcomes  to  co-incide  with  the  set  of 
equilibrium  outcomes.   This  condition  turns  out  to  be  satisfied  by  quite 
a  few  games  of  economic  interest. 

5.3  Theorem:  Let  [N,M,H]  be  the  mixed  extension  of  a  stage  game.   Sup- 
pose that  for  each  player  i,  there  exists  an  n-tuple  m  €  M  of  mixed 
strategies  for  the  stage  game  with  the  following  properties: 

i)    max  H. (m. ,m. .* )  =  H.(m  )  =  v. 

m . 

l 

ii)   for  all  j  ^  i,  H .  (m  )  >  6. v.  +  (1  -  6.)max  H.(m.,m, .»  ) 

j 
Then  the  set  of  outcomes  sustainable  by  perfect  equilibria  of  the  dis- 
counted supergame  co-incides  with  the  set  of  outcomes  sustainable  by 
equilibria  of  the  discounted  supergame. 

Proof:   Let  y  be  an  outcome  sustainable  by  a  grim-strategy  equilibrium 
of  the  discounted  supergame,  and  m(y)  the  associated  specified  sequence. 
Recalling  the  definition  of  the  last  defector  from  a  strategy  f  given 
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a  partial  history  m  used  in  the  proof  of  Theorem  5.1,  we  define  a  perfect 
equilibrium  strategy  f  by 

i  =  mi(y) 


-t,  1      t-1.    -t,  , 
f . (m  , . . . ,m   )  =  f . Cm)  = 


nr?  iff  LD(f,m)  =  {j} 


m. (y)   otherwise 


To  see  that  this  is  indeed  a  perfect  equilibrium  strategy  combination, 
we  first  observe  that  m(f)  =  m(y) ,  so  that  adherence  to  this  strategy 
results  in  a  payoff  of  y.   Secondly,  notice  that  this  strategy  calls  for 
a  grim  punishment  to  be  inflicted  on  any  defector,  regardless  of  whether 
that  defection  occured  while  playing  the  specified  sequence  or  a  punish- 
ment sequence.   It  follows  that  no  player  will  wish  to  unilaterally  de- 
fect from  the  specified  sequence.   Now  suppose  that  we  are  playing  a 
punishment  sequence.   The  condition  of  the  theorem  states  that  for  each 
player  i  there  is  a  way  of  implementing  the  grim  punishment  via  a  sta- 
tionary equilibrium  of  the  discounted  supergame:  by  condition  i)  the 
punished  player  cannot  improve  his  payoff  in  any  stage;  and  by  condi- 
tion ii)  no  other  player  will  find  it  in  his  interest  to  defect  from 
the  new  stationary  equilibrium.  QED 

As  a  special  case,  we  remark  that  the  condition  is  clearly  satis- 
fied if  there  is  an  equilibrium  of  the  stage  game  that  gives  each  player 
his  minmax  payoff.   This  is  clearly  the  case  with  Prisoner's  Dilemma, 
and  also  with  irany  economic  exchange  games.   In  the  latter,  a  player's 
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security  level  v.    is  almost  always    the  same  as  his   payoff   at    the  no- 
trade  point,    so    if   there  is   a  no-trade   equilibrium,    the  Theorem  applies. 
Examples  include:     Kurz'    "Altruism  games";    the  Shapley-Shubik-Dubey  family 
of  exchange   games;   and  Wilson's   Competitive  bidding  model. 
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